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Abstract 

Polarization phenomenon over any finite field F q with size q being a power of a prime is considered. This problem is a 
generalization of the original proposal of channel polarization by Ankan for the binary field, as well as its extension to a prime 
field by §a§oglu, Telatar, and Ankan. In this paper, a necessary and sufficient condition of a matrix over a finite field F 9 is shown 
\ under which any source and channel are polarized. Furthermore, the result of the speed of polarization for the binary alphabet 

obtained by Ankan and Telatar is generalized to arbitrary finite field. It is also shown that the asymptotic error probability of 
polar codes is improved by using the Reed-Solomon matrix, which can be regarded as a natural generalization of the 2x2 binary 
1 matrix used in the original proposal by Ankan. 
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I. Introduction 

C\l . 

ARIKAN introduced the method of channel and source polarization which gives an efficient capacity-achieving binary source 
and channel codes, respectively [ 3 ] . §a§oglu et al. generalized the polarization phenomenon to non-binary alphabets whose 
size is a prime j4|. In this paper, we further extend the polarization phenomenon to non-binary alphabets whose size is a power 
of a prime by using finite fields. 

The contributions of this paper are threefold. The first contribution is that we give a complete characterization as to whether 
'— an £ x £ matrix over a finite field gives rise to polarization. This extends the result on the binary field by Korada et al. Q 
to a general finite field. The second contribution is that we characterize the asymptotic speed of polarization in terms of the 
■ matrix used. This is again an extension of the result on the binary field by Korada et al. |5] to a general finite field. The third 
■^j- \ contribution of this paper is that we provide an explicit construction of an I x I matrix, which is based on the Reed-Solomon 
^sO ■ matrix, with asymptotically the fastest polarization for I < q. 

The organization of this paper is as follows. In Section [II] notations and definitions used in this paper are introduced. In 
Section Hill the basic transform of a source and polarization phenomenon by an £ x £ matrix over a finite field are introduced. 
In Section QV] an equivalence relation of q-ary source is defined for showing equivalence among several polarization problems. 
On the concept of equivalence among sources, equivalence of matrices is considered as well. Using the equivalence of matrices, 
the main theorem of this paper is stated here, which is a necessary and sufficient condition of matrix under which any source or 
• • ■ channel is polarized. In Section |Vj the Bhattacharyya parameter and its properties are shown. They are useful for proving the 
. £h ■ ma i n theorem in Section [VI] and speed of the polarization in Section [VIJl In Section [VI] a proof of the main theorem is shown. 
In Section [Vll] the speed of the polarization for a general £ x £ matrix is proved similarly to the binary case. In Section IVIIII 
the Reed-Solomon matrix is introduced, which yields asymptotically fast polarization in the sense discussed in Section I VIII 
In Section IIX1 the quaternary polar codes using the Reed-Solomon matrix is compared numerically with the original binary 
polar codes. Finally, Section IXl summarizes the paper. 

II. Preliminaries 

Let p be a prime number and q := p m where m is a natural number. Let ¥ q be a finite field of size q. Let F* be F g \ {0} 
and F p (7) be the simple extension of F p generated by the adjunction of 7 e ¥ q . Similarly, for A C¥ q and a matrix G over 
Fq, ¥ p (A) and ¥ p (G) denote the field extension of ¥ p generated by the adjunction of all elements of A and G, respectively. 
Let Aq := {[pi, . . . ,p q ] G K> | pi + ■ ■ ■ + p q = 1} denote the set of all g-dimensional probability vectors. For random 
variables X on a finite set X of size q and Y on a discrete set y, entropy H(X) of X and conditional entropy H(X \ Y) of 
X conditioned on Y are defined as 



H(X) :=-Y,Px(x) log P x {o 



xex 

H{X\Y):=- Px,Y(x,y)logP xlY (x\y). 
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In this paper, the base of the logarithm is assumed to be q unless otherwise stated, and hence H(X) and H(X | Y) are in 
[0, 1]. If a quantity A((X, Y)) determined from P x ,y has the form E[f([P x \ Y (x \ Y)] xe¥q )] for some / : A, ->• R, where E 
denotes the expectation, we write it as A(X \ Y). It should be noted that the arguments in this paper are directly applicable 
to the case where y is a continuous alphabet such as R, by replacing the summation ^2 ye y with the integral dy. The 
notation u^ 1 denotes the row vector [uq, u\, . . . , u^-i]. 



III. Source and channel polarization 
A. Source and channel polarization phenomenon 

In this paper, we consider source polarization on an I x I invertible matrix G over ¥ q . Let a g-ary source (X, Y) be defined 
as a pair of random variables on ¥ q x y. We first introduce a basic transform of source, (X,Y) —> {(I™, 5^'*')}i=o,...,f-i. 
Let {(^i,li)}i=o,...,^-i be £ independent drawings of (X,Y). Let C/g -1 be a random vector defined by the equation 
x q~ X = U^~ X G. Letting := {U u {IP ~ X , Y£~ x )) for i = 0, . . . ,1 - 1 defines the basic transform (X,Y) -> 

{(X''',yW)}j = o n where the random pair (J®,yW) takes values in ¥ q x (F* x 3/). From the chain rule for the 

entropy, one has 

f-l £-1 

| Y) = HiXt 1 | Yt 1 ) = HiUt 1 I r^ 1 ) = J2 H ( U * I ^"'.io" 1 ) = I yW )- (D 

i=0 i=0 

By starting with a source (X, F) and recursively applying the basic transform to depth n, we obtain £™ random pairs 
{(X( 6l )-( 6 ™),y( 6l )-( 6 '«))} (6l> ... i6n ) £{ o,...^_i}r 1 01 Let Bi, . . . , B„, . . . be independent uniform random variables on {0, ... , 
1}. Let (X„, Y„) := (A"( s 0-(W)"y(si)-'-(B»)) for n e {0, 1, . . . }. A random sequence {iJ„ : ct(Si, . . . , B n ) -measurable} n=0 ,i, 
is defined as H n := H(X n | Y„) where the conditional entropy does not take account of randomness of (B%, . . . , B n ). From 
the chain rule (fl} for the entropy, the random sequence {i/n}n=o,i,... is shown to be a martingale i.e., M[H n \ B\, . . . , B n -i] = 
H n _i. Then, noting that the sequence {i?n}n=o,i,... is bounded in the interval [0, 1], from the martingale convergence theorem, 
there exists a random variable f/oo such that H n converges to Hoo almost surely. The source polarization is defined in terms 
of Hoe as in the following definition. 

Definition 1 (Polarization). A source (X, Y) is said to be polarized by G if and only if 

[O, with probability 1 - H(X \ Y) 
-Woo = \ (2) 
[1, with probability H(X | Y). 

It should be noted that if Hoc is {0, l}-valued, the probability of Hqo = 1 is necessarily equal to H(X | Y) because of the 
martingale property E[H n | H ] = H = H(X \ Y). 

When the marginal distribution of X is uniform, the source polarization is called the channel polarization. As shown in 
Section |IV] the source polarization problem is also translated into the channel polarization problem. We therefore use the terms 
"source" and "channel" almost interchangeably, unless otherwise stated. As the first and main contribution of this paper, we 

show a necessary and sufficient condition of G under which any source or channel is polarized. Let G 7 := j ^ 

where 7 £ FJ. Ankan proved for the case q = 2 that the matrix G\ polarizes any source/channel Q, (6j. §a§oglu et al. 
generalized the result for prime fields [4|. They also showed that for the matrix G\ over the ring Z/qZ where q is not a 
prime, there is a counterexample of non-polarizing q-ary channel. Their counterexample also works for ¥ q whose size q is not 
a prime. Our purpose of this paper is to generalize these results to any matrix over any finite field. 



over ¥ q 



B. Construction of source and channel codes 

The polar code for source/channel coding is based on the polarization phenomenon. In this subsection, a rough sketch of 
construction of the polar code for channel coding is described. Given an I x I invertible matrix G which appears in the 
previous section, we first consider an £ n x £ n matrix G®" where ®" denotes the Kronecker power. For i £ {0, 1, . . . ,i n — 1}, 
*n*n-i ' ' ' *i denotes the £-ary expansion of i. Then, the generator matrix of a polar code is, roughly speaking, obtained from 
Qtgm by choosing rows with indice^l in the set 

{» € {0, . . . , £ n - 1} I H{X { ^- {1 ^ I y(*0-«»)) < e } 

with some threshold e > 0. If a channel (X, Y) is polarized by G, the ratio of chosen rows is asymptotically 1 — H(X \ Y) 
for any fixed e e (0, 1). For detailed descriptions of encoding and decoding algorithms, see J3) for the channel coding and 
and Q for the source coding. 

'Joint distribution of these random pairs is not considered in this paper. 
2 Row and column indices of matrices start with rather than 1. 
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IV. Equivalence relation on sources and main theorem 

In order to deal with a source (A, Y) in terms of polarization phenomenon, it is useful to define an equivalence relation up 
to which we do not have to distinguish sources. An equivalence relation (X, Y) ~ (X' , Y') which is desirable for our purpose 
has to satisfy the following two conditions. 

{X, Y) ~ {X', Y') => H{X | Y) = H(X' \ Y') (3) 
(X,Y) ~ {X',Y') => (l''),yW) ~ (X'W,y'W) for i = 0,l,...,£- 1 (4) 

The second condition (0]i should be satisfied for any £ X £ invertible matrix G. The significance of these two conditions is that 
sources which are equivalent in the above sense yield the same random sequence {i/ n }n=o,i,...> thereby behaving exactly the 
same as for the polarization phenomenon. 

Given a source (X,Y), the a posteriori distribution [px\y( x I y)]xew q € A q plays a fundamental role, in particular in 
determining the conditional entropy H(X \ Y) and other relevant quantities. We first introduce two equivalence relations on 
probability vectors. 

Definition 2. For p^ 1 e A g and p'q 1 e A q , we say Pq -1 ~ p'q~ if and only if there exists a permutation matrix a such 
that pjp 1 = p'l" er. For any s E N, [p x ] x ew- € A r and \p x ] x e¥° € A,,, we say [p x ] x ev- ~ [pLWf* if and only if there 
exists z G ¥ q such that p x = p' x+z for all xeF*. 

It is straightforward to see that 

\Px]xew> ~ [pLWf* ~ [pL/f]^6F= (5) 

holds for any s x s invertible matrix H since = j4 +z •£=>■ = P x h+ z h f° r an Y 21 ^ 

The ^-dimensional random vector [pxirl^ I ^OWf, € A 9 induces a probability measure on A g . If two random vectors 
[PxiW^ I ^OLeF, and [px'\Y'( x I Y')]xeF q defined from sources (X, Y) on ¥ q x y and (X',Y') on F g x 3^', respectively, 
induce the same probability measure on A g , we say (X, Y) <~ (A',y'). In this case, A(X \ Y) = A(X' \ Y') holds for 
any quantity of the form A(X \ Y) = E[f([p X \Y( x I Y)]xev q )]> and hence the condition ([3]) is satisfied. Furthermore, the 
equivalence relation ~ obviously satisfies ©. However, a weaker equivalence relation than ~ exists which satisfies both of 
the conditions (0 and @. First, a weak equivalence relation which only satisfies the condition (f3]) is defined as follows. 

Definition 3. For sources (X, Y) on ¥ q x y and (X' ', Y') on ¥ q x y' , we say (X, Y) ~ (A"', Y 1 ) if and only if the g-dimensional 
random vector [Px\y( x I ^OWf, induces the same distribution on A g /~ as the random vector [Px'\Y'( x I Y')]x£¥ q - For a 
function / : A q — > M which is invariant under any permutation of its arguments, a quantity ¥[f([P X \Y( x I 50]xeF a )] * s sa ^ 
to be invariant under any permutation of symbols in a posteriori distribution. 

The equivalence (X, Y) ~ (X', Y') implies A(X \ Y) = A(X' \ Y 1 ) for any quantity A(X \ Y) invariant under any 
permutation of symbols in a posteriori distribution, including the conditional entropy H(X \ Y). Hence, the equivalence 
relation ~ satisfies the first condition (0. However, the equivalence relation ~ does not satisfy the second condition ©. The 
equivalence relation ~ defined in the following is weaker than ~ and satisfies both of the conditions (f3]) and (0]). 

Definition 4. Let s € N. For pairs of random variables (X, Y) onF^xJ and (X', Y') on F" x y', we say (X, Y) ~ (X' , Y') if 
and only if there exists r e F* such that the g s -dimensional random vector [Px\Y( rx I ^OlseeFj induces the same distribution 

on A g ./~ as [P x >\ Y >(x \ Y')] xeW s. 

It is not hard to confirm the properties (X, Y) ~ (X',Y') =^> (X, Y) ~ (X',Y') and (X, Y) ~ (X',Y') =*> (X, Y) ~ 
(A', y'). From the latter property, it holds that (A, Y) ~ (A', F') H(A | F) = #(A' | F')> implying that the equivalence 
relation ~ satisfies the first condition (fJJ. The equivalence relation ~ also satisfies the second condition 

Lemma 5. 

(A, Y) ~ (A', y') (I®, yW) ~ (jr'W, F'W) / r i = 0, 1, . . . ,£ - 1 

for an arbitrary £ x £ invertible matrix G. 

Proof: For a source (A,y), let Aq -1 , y £_1 and C/^ 1 be what appear in the definition of the basic transform of 
it. The random variables A'q -1 , Y'q" 1 and C/'q 1 are defined in the same way for (X',Y'). The equivalence relation 
(A,y) ~ (A', Y') between sources (X,Y) and (A', Y') immediately leads to the equivalence (X 1 ^ 1 ,Y^ 1 ) ~ (A'o _1 , y'o _1 ) 
between their fth-order extensions. From <(5j and the identity (rx)G~ 1 = r(xG^ 1 ) for any r E ¥ q and a; 6 F^, it 
holds that (Xq _1 G? _1 ,1^ -1 ) ~ (A / o _1 G~ 1 ,y / o _1 ), or equivalently, (C/^ -1 , l^ -1 ) ~ (C^'o -1 , ^'o" 1 )- One therefore obtains 

(^(t/r 1 ,^ 1 )) ~ (^'i.p'o" 1 .^" 1 ))- ■ 

The equivalence relation ~ gives rise to the following several useful lemmas. 
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Lemma 6 (Source-channel equivalence O). Let (N, Z) be a random pair on ¥ q x y and X be a uniform random variable 
on ¥ q which is independent of (N, Z). Then, it holds that (N, Z) ~ (X, (X + N, Z)). 

Proof: One has (X, (X + N, Z)) * (-X + (X + N), (X + N, Z)) = (N, (X + N, Z)) ~ (N, Z), where the last equivalence 
relation is due to the assumptions on X. ■ 
The channel (X, {X + N, Z)) in Lemma [6] is a symmetric channel in the following sense. 

Definition 7 (Symmetric channel). A channel (X, Y) on ¥ q x y is said to be symmetric if and only if there exists a permutation 
(Tj; on y for each x £ ¥ q such that Py\x(v I x) = Py\x{°~x' -x{y) I x ') f° r an y V G y an d € ¥ q . 

The symmetricity is preserved under the basic transform. 

Lemma 8. For a symmetric channel (X, Y), (X^ l \ yw) is symmetric for any i G {0, . . . ,1 — 1}. 

Proof: The statement holds since Pye-i Y t-\ ((u^ 1 , m, ufzl), Uq" 1 ) = P^-i Y e ~ 1 (( u _1 ' u 'n u i+i)' w cT 1 ) wnere w j = 

C r G s , J «-n 1 )(%)- ■ 

Lemma 9. For any channel (X,Y) and any symmetric channel (X',Y'), let (Z, {Y,Y')) and (Z' ', (y, y')) fee f/ze channels 
defined by letting Z = X — X' and Z' = X = X' + a /or any fixed a G F q , respectively. For these channels, it holds that 
(Z,(Y,Y'))l(Z',(Y,Y')). 

Proof: The equality P Z , {Y ,Y>)(z, M)) = Pz>,(Y,Y>)i*> (V^W))) im P lies (f. (^0) 1 (^'. ■ 
We say that £ x I invertible matrices G and G are equivalent when yW) ~ yW) for i = 0, 1 where 

{(XW,yW)} i= o,...^_i and yW)}, =0 ,...,i-i are two sets of ^ random pairs generated from an arbitrary common source 

(X, Y) via the basic transform using matrices G and G, respectively. 

Lemma 10. Let G and V be an I x I invertible matrix and an t x £ invertible upper triangular matrix, respectively. Then, G 
and VG are equivalent. 

Proof: Since X^ 1 = U^VG <^> X^G' 1 = U^V =: U'^\ the equivalence (U i} (t^ -1 ,^ -1 )) ~ ([//, [U' l ^\ Y^ 1 )) 
implies the lemma. ■ 
Obviously, a permutation of columns of G does not change (jW,yW) up to the equivalence ~ for i = 0, 1, so that 

G and its column permutation are equivalent. Hence, without loss of generality, one can assume that G is a lower triangular 
matrix. Lower triangular matrices with unit diagonal elements equivalent to G are called standard forms of G. A standard form 

of G is not generally unique. Note that the standard forms of G 7 are 



7- 1 



and 



If there exists the identity matrix 



1 
[7 1 

as a standard form of G, it is the unique standard form of G. In this case, one obviously has the identity y«) ~ {X, Y) 

for all i € {0, . . . ,£ — 1}, implying that G does not polarize any source. For other cases, the following main theorem shows 
necessary and sufficient conditions of G under which any source is polarized. 

Theorem 11. The fallowings are equivalent for an I x t invertible matrix G over ¥ q with a non-identity standard form. 

• Any q-ary source is polarized by G. 

• It holds ¥ p (G) = ¥ q for any standard form G of G. 

• It holds ¥ p (G) = ¥ q for one of the standard form G of G. 

Corollary 12. Any q-ary source is polarized by the 2x2 matrix G 7 over ¥ q with 7 € if and only if¥ p {^i) = ¥ q . 

Note that the identity matrix is the standard form of an invertible matrix G if and only if there exists an upper triangular 
matrix as a column permutation of G. Thus, Theorem QT| includes the known results that an invertible matrix G is polarizing 
if and only if any column permutation of G is not upper triangular for q = 2 [5, Lemma 1] and for q prime 



V. Bhattacharyya parameter 

Bhattacharyya parameter is useful both for proving the polarization phenomenon, and for evaluating asymptotic speed of 
polarization. In this section, it is shown that polarization of Bhattacharyya parameter and polarization of the conditional entropy 
are equivalent. Let (ft := {1, . . . , q}, 2 n , P) be a probability space. The probability measure P can be represented by the vector 

[y/P{l), • • ■ , y/Pjqj] e S q where S q := {[p x , . . . ,p q ] € R| | p\ + h v\ = !}• The L p norm of x e C q is defined as 

L p (x) := (\ Xi \p + - ■ ■ + \x q \P) 1 /P for any p > 1. The L\ norm of p e S q attains the minimum 1 at the deterministic distributions 
i.e., the distributions of the form [0, . . . , 0, 1, 0, . . . , 0], and the maximum Jq at the uniform distribution, represented by 
u := [1/y/q, . . . , 1/yfq] € S q . On the other hand, the deterministic and uniform distributions also minimize and maximize the 
entropy H(p) := — YliPi \°EPi of p e S q , respectively. 

The following lemma states that closeness of a probability distribution to determinism or uniformity measured in terms of 
its entropy value is equivalent to that measured in terms of its Li-norm value. 
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Lemma 13. For any e > 0, there exists 6 > such that 

{ P eS q \ H{p) <8}c{ P eS q \ L x (p) - 1< e} 
{ P eS q \ Lxijp) - l< 6} c {p e S q | F(p) < e} 
{p G 5, | 1 - ff(p) <*}C{pe5,| s/q-L x {p) < e} 
{pe5,| y/q-Li(p) < 5} C {p G 5 g | 1 - ff(p) < e}. 

Proof: Since 



(6) 
(7) 
(8) 
(9) 



L 2 ( M - p)» = £ ( A - ft ) = 2 - A J> = A (V5 - Lx(p)) 



© is a consequence of continuity of H(p). The relationship (HJ follows from 

1 - H{p) = 1+^? logP? = -2 Erf log 4r > Erf f 1 - 4 



(V5-ii(p))- 



i=l i=l 

Since i?(p) = 2 £\ pf log(l/pj) < 2 log £\ pi = 2 log Li (p), the relationship © holds. Since iT(p) log q = -J2iPi l°Se Pi > 
— \og c ma.Xipf > 1 — maxipf > (L\(p) — l) 2 /{q — 1) (see ( 1221 for the last inequality), the relationship (O holds. ■ 

Hence, the entropy is close to and 1 if and only if the L\ norm is close to 1 and Jq, respectively. From Lemma [T3l and 
the following observation of inequalities about expectation of the square of L\ norm, 

2 



i<E F ^) 



yey 



J2 JPx\y(x I y) 



x£F„ 



< 



<-A E E Pr(y)yJPx\Y(x | y)P x \Y(x> | y) < 1 



x^x' 



for a random pair (X, F), the conditional entropy | V) is close to and 1 if and only if the Bhattacharyya parameter 

Z(X | Y) G [0, 1] for (X, Y), defined as 

Z(X | Y) := -A. £ £ PY(y)y/Px\Y(x \ V )P x \y(x' \ y) 

xe¥ q ,x'e¥ q yey 

x^x' 

is close to and 1, respectively. Obviously, Z(X \ Y) is invariant under any permutation of symbols in a posteriori distribution 

of (X, Y). For d G F*, we define | F) G [0, 1] as 



Z d (X | y) := E E Py(y)^Px\ Y (x | 2/)Px|y(a: + d | y). 
The Bhattacharyya parameter Z(Jf | F) can be expressed as the average of Zd(X \ Y) 

Z(X\Y) = ^-J2 Z ^ X I Y )- 

q — 1 

dew* 

Hence, Z(X \ Y) is close to and 1 if and only if Zd{X \ Y) is simultaneously close to and 1 for all d G F*, respectively. 
Note that the Renyi entropy, which is similar to the L p norm, also explains the Bhattacharyya parameter. 



VI. Proof of the main theorem 

A. Sketch 

In this section, the proof of Theorem QT| is shown. In Section IVI-BI it is proved that if there exists a standard form G of G 
such that F P (G) ^ F q , there exists a source which is not polarized by G. It means that if any source is polarized by G, any 
standard form G of G satisfies F P (G) = F q . In Section [VI-CI it is proved that if there exists a standard form G of G such 
that Fp(G) = F q , any source is polarized by G. This completes the proof of Theorem [TT] 
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(a) Complete representation. (b) Representation under conditioning. 



Fig. 1. Tanner-graph representations of the relationship among (7p _1 , Xq" 1 and Y^' 1 with a standard form G. 



B. Necessity 

Let G be an arbitrary standard form of G. Assume ¥ p (G) ^ ¥ q . Let M := [¥ q : ¥ p (G)) be a degree of a field extension 
¥q/¥ p (G). Fix a basis of a linear space ¥ q /¥ p (G). Each x £ ¥ q is naturally identified with an M -dimensional vector 
over ¥ p (G) according to the fixed basis. Let [Vq, . . . , Vm_i] € ¥ p (G) M be a random vector corresponding to X e ¥ q 
according to the identification between ¥ p (G) M and ¥ q . If one takes a source (X, Y) for which Vq, • ■ • , Vm-i are independent 
conditioned on Y, recursive application of the basic transform to the source (X, Y) affects Vi separately for i £ {0, . . . , M— 1}, 
i.e., one can regard the polarization process of the source (X, Y) as a collection of M independent polarization processes 
{(Vi,„,Y n ) := (V^ Bl) ' " {B "\Y( Bl >-< B ^)} n=0X ..., i = 0,...,M- 1. In this case, if H{V % \ Y) is not constant among all 
i € {0, ...,M— 1}, the source ([Vb, ■ • ■ , Vm-i],Y) cannot be polarized in principle. Note that the situation is essentially 
equivalent to the polar coding for the M-user multiple access channel (9). 



C. Sufficiency 

In the proof of sufficiency, (X, Y) is assumed to be a symmetric channel. From Lemma [6] we do not lose generality by this 
assumption. For any j E {0, . . . , I — 1}, it holds via the chain rule for the entropy that 



J2h(x^ I yW) = H{uf 



■3-1 



i=3 



(10) 



for any (X, Y). Let G be an arbitrary standard form of G, and assume that Ui 1 and (JfW, Y"W) for i G {0, ...,£— 1} are 
defined with G. The Tanner-graph representation of the relationship among Uq, X^ 1 and Y"^ 1 is shown in Fig. [1(a)] All 
the terms in the sum on the rightmost side of ( TTOb are at most H(X \ Y) for any symmetric channel (X, Y) on a standard 
form G. It also holds that H(X { n ] \ y1 i) ) - H(X n \ Y„) -> with probability 1 as tn oo for alH € {0, . . . ,£ - 1} since 
{H(X„ | Y n )} n= o,i,... converges almost surely. Combining these two facts, as well as Lemmas [8] and |9] one observes that 
each of the terms in the sum on the rightmost side of (TTOb evaluated with (X , Y) = (X n ,Y„) must be close to H(X n | Y n ) 
with probability 1 as n — > oo. In particular, 



H(X n | Y„) - H(U 3 



TJ3- 1 TT l 



-1 V t-1 







(11) 



(x,y)=(x„,Y„) 

holds with probability 1 . This property allows us to reduce the problem of polarization with G to the properties of the basic trans- 



form with 



or equivalently, that with G 7 . Indeed, when one considers the quantity H(Uj \Uq 1 , Uj,^, Y^ 



(x,r)=(X„,Y n ) 



one can safely prune the subgraphs related with the conditioning variables Uq , Uj +1 from Fig. |l(a)| to obtain Fig. |l(b)| due to 
Lemma|9] since (X„, Y„) is always symmetric from Lemma[8] Then, one obtains an upper bound H(Uj \ Uq~ , U^_\ , Yi, Yj) 



of H(U, | Ut\uj~lX 



(x,r)=(x„,Y„) 



for i < j. Let 7 be the (i, j)-element of G, and let {(xl 0) , Yl 0) ), (X. 



(x,y)=(x„,Y„ 
Yl 1} )} 
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be the random pairs obtained from (X„,Y„) via the basic transform with the 2x2 matrix 



Then, the above upper 



1 

bound is nothing but H(X^ | Y^). From ([□}, it holds that H(X n | Y„) - H(X^ | y1 1} ) -> with probability 1. Therefore, 
the following proposition implies the sufficiency of the main theorem. 



Proposition 14. Let A be a non-empty subset ofW*. Let i^(ji))}n=o,l,... be a sequence of random pairs. Assume 

i?(X( n ) | Y(„)) — ^(-^-(n) I ^(n) ) ~~ * ^ / or a ^ ^7 wnere 7 G A 77ze«, /or any e > 0, f/iere ejo'sfs n G N smc/i ?/i<atf 

Z w (JT (n) | Y (n) ) < e, for all t G F p (A) x 
or Z M (X (B) | Y (n) ) > 1 - e, for all t G F p (A) x 

/or any n> uq and any deFJ. 

When F P (G) = F 9 , Proposition [141 states that the random sequence H n = H(X n | Y„) is close to or 1 for sufficiently 
large n with probability 1. Hence, must be {0, l}-valued, i.e., any source (X, Y) is polarized by G. Proposition [T4l is 
proved by using Lemma [15] and Lemma [16] below. 

Lemma 15. Let Y( n ))} n= o,i,... ^ e a sequence of random pairs. Assume H(X^ \ Y( n )) — H(xQ \ Y^*) — > for 

G 7 where 7 G F x . 77zen, for any e > f/iere exists no € N such that 

Z Td {X (n) I Y (n) ) < e, for all i = 0,...,q-2 
or Z rd (X {n) I Y (n) ) > 1 - e, /or all i = 0, ...,<?- 2 

/or any n > no an <^ flw 3 ; d G F x . 

The proof of Lemma [TBI is in Appendix lAl Lemma [TBI means that {Z^i d (X^ | Y( n ))}i=o,...,g-2 are simultaneously close 
to or 1 for each d £ FJ. Lemma [TBI implies Proposition [T4l when i C F ? x includes a primitive element 7 of F p (A), i.e., 
{7* I i = 0, 1, . . . , q — 2} = F p (A) x . If not, we require the following lemma for proving Proposition [T4l 

Lemma 16 (Ej). For any d\ and di in F x satisfying di 7^ —d\, 

Vl-Z dl+d2 (X\Y)< y/l-Z dl (X\Y) + y/l-Z d2 (X\Y). 

Proof: Since 

l-Z d {X\Y)= l - Y, J2{\/ p x,Y(x,y)-^Px,Y(x + d,y)y 

x£F q y ey 

the statement is obtained from the triangle inequality of the Euclidean distance. ■ 
Consider partitioning of F x according to the equivalence relation d ~ d' <^=> djd! G ¥ P (A) X . For any fixed d G F x , if 
Zd{X( n ) I Y( n )) is close to 1, then it follows from Lemma [TBI and Lemma [TBI that Z d '{X^ \ Yr n \) is also close to 1 for all 
d! ~ <i Otherwise, all of {Z d i(X^ | Yr n ))}d'~d must not be close to 1, which means from Lemma [TBI that all of them must 
be close to 0. This completes the proof of Proposition [14] 

VII. Error probability, total variation distance to the uniform distribution and speed of polarization 
A. Preliminaries 

In this section, we consider speed of polarization by an £ x £ invertible matrix G over ¥ q . Let 

P C (X I Y) := 1 - V Py(y)maxP xlY (x \ y). 

This is the average error probability of the maximum a posteriori estimator x(y) := argmax xe F, Px\y( x \ y) of X given Y. 
The random quantity P e (X n | Y n ) plays a key role in studying speed of polarization. It provides a bound of the block error 
probability of polar codes with successive cancellation decoding applied to channel coding [3|. More precisely, if one has 

Pr(P e (X„ I Y n ) < e) > R 

then it implies existence of a polar code for channel coding with blocklength £ n , rate R, and the block error probability at 
most £ n Re. Obviously, P e (X | Y) is invariant under any permutation of symbols in a posteriori distribution of (X, Y). The 
average error probability P C (X \ Y) takes a value in [0, (q—l)/q\. As it has been the case in the study of the binary case JSJ, 
the Bhattacharyya parameter is useful for bounding the error probability. 

Lemma 17. 

?-i f rr-mr-. TwT^tt /^t^ttA 2 / D <v , ^ / min \ { q -i)Z(x \Y) + k(k-iy 



f- + - Vi-z(x\Y)) < p c( x 1 y) < fc(fc + 1} 



x 



Proof of Lemma [T7] is in Appendix [B] 

Another quantity which we study in this section is the expected total variation distance T(X | Y) between a posteriori 
probability and the uniform distribution, defined as 

1 



t(x\y) :=J2 p y(y) 



yey x£¥ q 



Px\y{x I y) 



Properties of the random quantity T(X„ | Y„) is important in polar codes for lossy source coding iflOl . ifTTl . More precisely, 
if one has 

Pr(T(X„ | Y„) < e) > R 

for the test channel (Xo,Yo) = (X, Y), then there exists a polar code for source coding with blocklength l n (l — R), rate 
1 — R and the average distortion at most D + d ma , x £ n Re where T> denotes the average distortion for the test channel and where 
4ax is the maximum value of the distortion function IflOl . IfTTl . Note that T(X \ Y) is invariant under any permutation of 
symbols in a posteriori distribution. The total variation distance T(X \ Y) takes a value in [0, 2(g — 1)/?]. The following 
lemma establishes relationship between the total variation distance T(X \ Y) and the average error probability P C (X \ Y). 

Lemma 18. 

2 - - P C (X | Y)^j < T(X | Y) < 2 ^ g ~ ^ - | fc _max ^ ^k(k + 1)P C (X \ Y) - k(k - 1) 

The proof is in Appendix [C] 

The Fourier transform of a posteriori probability is defined for analyzing T(X \ Y). 

Definition 19 (Character). Let u p G C be a primitive complex p-th root of unity. Define x( x ) := ^J 1 ^ for any ieF, where 
Tr : W q — > W p is defined as x <— > E^Tq V . Here, Tr(x) e F p appearing in the exponent should be regarded as an integer via 
the natural correspondence between ¥ p and Z/pZ. 

From the definition of x(a;), it satisfies the following properties. 

X(0) = 1, \ x (x)\ = 1, for any x G F„ 

\{x + z)= x(x)x(z), for any i,z£ ¥ q , x(^) = 0. 



x£F„ 



In this paper, we only use x( x ) through these properties. 



Definition 20 (Fourier transform). For any fixed y € y, the Fourier transform of the a posteriori probability Px\y of a source 
(X, y) is defined as 

I V) ■= p x\y(z I y)x{wz) 

z£F a 



for m; g F q . 



Note that P^, y (0 | y) = 1 for any yey. Like the role of Z(X | Y") in studying P e (X \ Y), the auxiliary quantity 
S(X | y), defined as 



S(X I y) := -4y E E I f ) 



can be used for analyzing T(X \ Y). The quantity S(X \ Y) takes a value in [0,1]. Note that, although S(X \ Y) is 
identical to T(X \ Y) (and 1 - 2P C (X \ Y)) when q = 2, Spf | y) is in general different from T(X | y). In this regard, 
consideration of the quantity S(X \ Y) is a novel idea that comes into play when one considers non-binary cases. Although 
S(X | y) is not invariant under any permutation of symbols in a posteriori distribution, S(X \ Y) is invariant under a 
permutation of symbols in a posteriori distribution when the permutation is addition or multiplication on the finite field i.e., 
S(X | Y) = S(r{Y)X + d{Y) \ Y) for any d : y -> F q and r : y -> F*. Hence, if (X,y) ~ (X',y'), it holds that 
5(X | Y) = spr | Y'). 

The following lemma relates the quantity S(X \ Y) with the average error probability P C (X | Y). 
Lemma 21. 

i - -^—Pcix | y) < s(x | y) 

a - 1 



< min <k(k + l) 

k=\,...,q-l 




Pc{ X\Y))Jl--L- k —±+(p c {X\Y) h 



q-1 k V fe / V g- 1 fc + 1 
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The proof is in Appendix [D] 

We now define the following equivalence relation for establishing relationship among several quantities for a source (X, Y) 
defined so far. 

Definition 22. For A(X \ Y) £ [0, 1] and B(X \ Y) £ [0, 1], we say A(X \ Y) ~ B(X \ Y) if and only if there exists e > 
and c £ (0, 1] such that if B(X | Y) < e, 

B(X | Y)° < A(X | Y) < B(X | Yf 

and if 1 - B(X \ Y) < e, 

(1 - B(X | Y))i < 1 - A(X | Y) < (1 - B(X \ Y)) c 

for any source (X, Y). 

From Lemmas \T7\ [18] and [2T| the following corollary is obtained. 
Corollary 23. (q/(q - l))P c (X \ Y) £ Z(X | Y) £ 1 - (9/(2(9 - l)))T(Jf | Y) £ 1 - S(X | Y). 

The following four quantities are used in the derivation of the speed of polarization in the next subsection. 
Definition 24. For any channel (X, Y), Z max (X, Y) and Z min (X, Y) are defined as 



Z max (X,Y):= max V JPr\x(v | x)Fy ]x (y \ x') 



Z min (X,Y) := min £ JjV|x(v I *)Py|xO/ I x>) 

x£W Q ,x'£F q z — * V 

y&y 



For any source (X, Y), S max (X, Y) and S m [ n (X, Y) are defined as 

5 max (X,F) := max VP y (y) P* |r (w | y) 



yey 

S mia (X,Y) := min ^Py(y) P x]Y (w \ y) 

The quantities Z max (X, F) and Z m ; n (X, Y) are invariant under any permutation of symbols in a posteriori distribution. 
Although S ma x(X, Y) and S m i n (X, Y) are nof invariant under any permutation of symbols in a posteriori distribution, it holds 
that 5 max/min (X I Y) = S max/min (rX + d(Y) \ Y) for any d : y -> F, and r e . Hence, if (X, Y) ~ (X', Y'), it holds 
that S , max / m i n (AT I Y) = 5 f max / m in(^' I ^O- ^ * s a ^ so straightforward to see the inequalities Z min (X. Y) < Z(X \ Y) < 
(X,Y) and S min (X,Y) < S(X \ Y) < S max (X, Y) to hold. 



B. Speed of polarization 

In this subsection, we assume that H(X \ Y) £ (0, 1), and also assume in view of Lemma |6l without loss of generality, 
that (X,Y) is a channel. The following theorem holds, which was shown by Ankan and Telatar lfl2l . Korada et al. and 
Korada lfl3l for the binary case with an additional condition. 



Theorem 25. If a channel (X, Y) is polarized by G, it holds that for any e > 0, 

H(X I Y) 

(12) 



lim Pr (P e (X„ I Y„) < 2 -^ <G) - £) ") =1-H(X\Y) 



lim Pr(P c (X„|Y n )<2^ (Ec<G)+£, ' 1 )=0 

where 

-1 



is the quantity called the exponent of G for channel coding, and where Dc(G) denotes Hamming distance between i-th row 
of G and the linear space spanned by (i + l)-th row to (I — V)-th row of G. Furthermore, it holds that for any e > 0, 

lim Pr (T(X„ I Y„) < 2 -^ G ^) = H{X | Y ) 

0, 



lim Pr (T(X„ I Y n ) < 2" 

n—too \ 



10 



where 

^):=^£log^)( G ) 

b i=0 

is the quantity called the exponent of G for source coding, and where Dg l \G) denotes Hamming distance between i-th column 
of G _1 and the linear space spanned by 0-th column to (i — l)-th column of G~ x . 

Remark 1. Korada proved (QjJ for the binary case with the aid of the condition (G) > D ( s 1+1) (G) for i = 1-2 lfl3ll. 

In this paper, ( fT3] l is proved without any additional condition for both binary and non-binary cases. 

From Theorem [25] the error probability of polar code as channel code of rate smaller than I(W) and the distortion 
gap to the optimal distortion of polar codes as source code are asymptotically bounded by 2~ l c and 2~ e & , 

respectively lfl3l . From Corollary l23l it is sufficient to prove ( fT2l and dT3l> for Z{X n | Y„) and S(X n | Y„) instead of 
-Pe(X n | Y„) and T(X„ | Y n ), respectively. The general proof shown in |14|, 031 can be used for our purpose. 

Lemma 26 (HI), 03]). Let {Z„}„ =0 a,.. 

be a random process ranging in [0, 1] and {-D n }n=o,i,... be i.i.d. random variables 
ranging in [l,oo). Assume that the expectation oflogDo exists. Four conditions (c0)-(c3) are defined as follows. 
(cO) Z n £E (0, 1) with probability 1. 

(cl) There exists a random variable Z x such that Z n — > Z^ almost surely. 

(c2) There exists a positive constant cq such that Z n+ \ < cqZ^ with probability 1. 

(c3) Z® n < Z n+ \ with probability 1. 

If (cO), (cl) and (c2) are satisfied, it holds that 

lim Pr (Z n < 2 -^ Do] -' )n ) = Pr (Z 00 = 0). (14) 

If (cO), (cl) and (c3) are satisfied, it holds that 

lim Prfz„<2^ <E[IOEfD ° ,+£, "U0. (15) 



In the above, £ is any constant greater than 1. 

Theorem[25jis proved by applying Lemma|26]to appropriate pairs of random processes. The first equation of ( fT2b is proved 

( B ) 

by confirming that the pair of the random processes {Z n — Z max (X„, Y„)} n= o.i,... an d {D n — Dc " (G)}n=o,i.... satisfies 
the conditions (cO), (cl) and (c2) and then by applying ( TBI . The first equation of (fT~3T > is proved in the same way, by observing 
that the pair of the processes {Z n — 5 max (X„, Y„)} n= o,i.... and {D n = D\ " (G)} n =o,i.... satisfies the conditions (cO), (cl) 
and (c2) and then by applying (fl4] i. The proof of the second equation of ( TTZb is obtained similarly, by confirming that the 
pair of the processes {Z n — Z m ; n (X„, Y„)} n= o,i.... and {D n = Dc (G)}n=o,i,... satisfies the conditions (cO), (cl) and (c3) 
and then by applying (fT3T >. For the proof of the second equation of ( fT3l , one should confirm that the pair of the processes 
{Z n = 5 m in(X n , Y n )} n= o,i,... and {D n = Di S "^(G)} ra= o,i, satisfies the conditions (cO), (cl) and (c3) and then apply (TT~5T >. 

From Lemma [T3l and Corollary |23l the condition (cl) is satisfied by Z n — Z ma ^(X n ,Y n ), Z m [ n (X n ,Y n ), 5' max (X„,Y n ) 
and 5' max (X n ,Y n ) when a channel (X, Y) is polarized by G. In these cases, it holds that Z^ € {0,1} with probability 
1, and that Pr(Z oc = 0) = 1 - H(X | Y) for Z n = Z max (X„,Y„) and Z min (X„,Y„), and Pr(Z oc = 0) = H{X \ Y) 
for Z n = S max(^n:Y n ) and 5'm.in(^n; Y n ). The following lernma shows that the pair of {Z n — -Zmax(X n , Y n )} n — o,i,... 
and {D n = Dc n (G)} n =o 1 ... satisfies the condition (c2), and that the pair of {Z n = Z m { n (X n , Y n )} n= o 1 ... and {D n = 

(B ) ' ' 

D c n (G)} n= o,i,... satisfies the condition (c3). 

Lemma 27 (0). For i E {0, 1}, it holds for any channel (X, Y) that 

z max (xW,yW) < ^- 1 -^ max (x,y)^ i) ( G ) 

The proof is omitted since the same proof for the binary alphabet in [5 1 applies to the non-binary cases as well. The following 

(B ) ^ 

lemma shows that the pair of {Z n = SmaxC^ni Y n )}„ = o 1 ... and {D n = Ds n (G)} n —o 1 ... satisfies the condition (c2), and 

( B ) 

that the pair of {Z n = S m i n (X n , Y n )} n= o,i,... and {D n = D$ n (G)} n= o,i,... satisfies the condition (c3). 
Lemma 28. For i £ {0, ...,£— 1}, it holds for any source (X, Y) that 

s max (x«,y«) < ^ max p^)^ )(G) 

S miu {X,Y) D ^^ <5 min (X«,Y«). 
The proof is in Appendix[E] Since S max (X, Y) — 1 5(X | F) = 1, it always holds that S max (X, Y) < 1 and hence the 

(B ) 

condition (cO) is satisfied by the pair of {Z n = S ma ^(X n , Y n )} n= o,i ) ... and {D n — Dl n (G)} n= o,i,.... Hence, the first equality 
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in (fl~3T > is proved by applying Lemma [26] to the pair of {Z n = S' max (X n , Y„)}„=o,i.... and {D n = Di Bn) '(G)}n=o,i,.... When 
< S^i^X, F), the condition (cO) is satisfied by the pair of {Z n = S m i n (X n , Y„)}„ = o,i,.., and {D n — Ds " (G)}„ = o,i,..., 
and hence (ITSb can be applied to the pair to prove the second equality in ( PT3T i. When < Z m i n (X, Y) < Z max (X,Y) < 1, 
( PT4l > and ( fl"5l > can be applied to the pair of {Z n = Z max (X„, Y n )} n= o i ... and {D n = D^ n (G)} n= o i and to the pair of 
{Z n = Z m i n (X n , Y n )} n= o i and {£>„ = Z?c " (G)}„ = o,i,..., respectively, since the pairs satisfy the condition (cO), which 
proves (1121 1, For the other cases, it is sufficient to prove the following lemma. 

Lemma 29. 

lim Pr (Z min (X„, Y„) > 0,Z max (X„,Y„) < l) = 1 

n—too 

lim Pr(S roin (X„,Y n ) >0) =1 

Proof: For the first equation, let us consider a cr{B\, B n ) -measurable random process {£„ := £(X n , Y n )} n =o,i,... 
where 

£(X,Y) := ({(x,x') G F 2 q | Z XtX ,(X \ Y) = 0} , G F 2 | | Y) = l}) 

and where 



^x,x'(X I Y) := Y, \JPy\x{v I x)P y | X (y | x>). 
yey 

Then, {£n}n=o,i,... is obviously a Markov chain. The Markov chain {£ n }n=o,i,... nas the absorbing state (</>,</>), i.e., Pr(£„ = 
(0,0) | £n-i = (0,0)) = 1- Although there are two other absorbing states ({(x, x') e | i ^ x'},{(x,x') € F 2 | x — x'}) 
and (</), F 2 ), these states are isolated, i.e., these states are not accessible from other states. From any state which is accessible 
from the initial state £o> (0,0) is accessible both since any source accessible from the original source (X,Y) by G is also 
polarized by G, and since once a source satisfies Z max (X„_i, Y„_i) < 1 or Z m i n (X n -i, Y n _i) > 0, the offspring sources 
also satisfy Z max (X n ,Y n ) < 1 or Z m i n (X n ,Y n ) > 0, respectively. Hence, lim n _j.oo Pr(£„ = (0,0)) = 1, proving the first 
equation of the lemma. 

The second equation is obtained in the same way. Let us define 

s w (x\y) :=Y,Py{y)\p*x\Y{ w \v) 

yey 

and let {r/„ := r)(X n , Y n )}n=o.i.... be a &(Bi, . ■ . , B n ) -measurable random process where 

r)[X, Y) := ({w G F q | S W (X | Y) = 0}) . 
Then, {??n}n=o,i,... is a Markov chain since one obtains from the derivations of ( fZSb and d26b in Appendix [E] that 

max TT S Z] (X\Y)< S w (X« \ F « ) < <f max TT S Zj (X | Y) 
for any w G F* and i — 0, . . . ,£ — 1 where Ci(w) is the affine space 

{Ej=o a i' l j + I a _1 e F ?l defined on the 
basis of the columns of G -1 := [ho, hi, . . . , hi-i]. The superscript 4 here denotes transpose of a vector. Then, it holds that 
lim rl _j. 00 Pr(?/„ = (/)) = 1 due to the same reason as that for {£, n }n=o.i,...- ■ 
A more detailed asymptotic analysis depending on the rate can also be performed as shown in [16|, ifTTl . lfT4l . lfT31 for the 
binary case. For example, under the condition that G polarizes (X, Y), one can prove that for R G (0, 1 — H(X | Y)), 

lim Pr P e (X„ | Y n ) < 2"« V ( ' ^ = i? (16) 



holds for an arbitrary function satisfying /(n) = o( v / n), where 

^ :=7E(log,^ } (G)-£; c (G)) 2 

and where Q _1 (-) is the inverse function of the error function Q(t) := J t °° 1 2 dz/V^- 

In the binary case, any source is polarized by G if and only if E C (G) > [5|. The property also holds when q is a prime 
since the condition E C (G) > is equivalent to the condition that a standard form of G is not the identity matrix. However, it 
no longer holds when q is not a prime, in which case there may be sources which are not polarized by G even if E c (G) > 0, 
as shown in Section IVI-BI Since non-zero scalar multiplication of a column does not change the exponent E C (G), even if there 
are non-polarizing sources for G satisfying E C (G) > 0, appropriate scalar multiplication of a column of G gives a matrix with 
the same exponent E C (G) which polarizes any source. 
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VIII. Reed-Solomon matrix and its exponent 



Let ¥ q = {x , ■ • ■ , x q -i}. Let a = [a , . . . , a k -i] G and p Q (X) : 
g-ary extended Reed-Solomon code is defined as <p(a) := [Po(ico))Po(^i 



When x q -i = and = a 1 for i = 0, . . . 
submatrix of the q x q matrix Grs((?) over 



ao + a%X + ■ ■ • + afe-iX* 5-1 . The encoder of the 
... ,p a (cc q _i)]. Let a be a primitive element of ¥ q . 
q — 2, the generator matrix of the q-ary extended Reed-Solomon code is a lower 
7 „ which we call the Reed-Solomon matrix 



G RS (<Z) := 



a 

„,2 



1 



,,2(9-2) 
1 



1 

■v2(?-2) 



Jq-2)(q-2) 



1 



From Theorem QTJ any source is polarized by the Reed-Solomon matrix. Since extended Reed-Solomon codes are maximum 
distance separable (MDS) codes, one has D c — i + 1 for i = 0, . . . , q — 1, and therefore the exponent of the Reed-Solomon 
matrix for channel coding is E c (G-rs(q)) = log(<?!)/<7- The inverse matrix of the Reed-Solomon matrix Grs(<?) is 



Grs(«z)- 



1 



1 



a -(g-2) a -2(,-2) 











1 

-(9-2) 
-2(9-2) 



a - (g -2)(9-2) o 

-1 



Hence, the exponent of the Reed-Solomon matrix for source coding is also E s (G-rs{q)) — log(ql)/q. Note that both the 
exponents log(q\)/q monotonically increases in q and converges to 1 as q — » oo. 

For i G {0, 1, . . . , q n — 1}, i n i n -i ■ • - i\ denotes the g-ary expansion of i. For polar codes constructed on the basis of the 
matrix Grs(<z), rows of Gr.s(<7)®™ whose indices are in the set 

{z G {0, ...,(?"- 1} | H(X^>"^ | Y^-^) < e} 



with some threshold e > are chosen, as mentioned in Section IIII-BI For the Reed-Muller codes, on the other hand, rows of 
Grs(<z)®" whose indices belong to 

{i G {0, . . . , q n -l}\h + --- + i n >n } 

are chosen for some threshold no G {0, 1, . . . , n(q — 1)}B In order to maximize the minimum distance, rows of GR,s((/)® n 
with indices in the set 

{i G {0, . . . , q n - 1} | {ix + 1) • • • (in + 1) > n }. (17) 

with some threshold no G {1, 2, . . . , q n } should be chosen. Hence, unless q = 2, the selection rule for the Reed-Muller codes 
does not maximize the minimum distance. Codes based on the selection rule ( [T7i > are sometimes called Massey-Costello- 
Justesen codes 1 18 1 and hyperbolic cascaded Reed-Solomon codes |fl9l . Note that the minimum distance of Reed-Muller codes 
grows like q n / 2 +°( n ) while the minimum distance of polar codes and hyperbolic codes grows like g E c( G Rs(9)) n + ( n ). From the 

above observation, the Reed-Solomon matrix can be regarded as a natural generalization of the matrix 



1 1 



in the binary 



We now consider the maximum exponent E max (q,£) := E C (G) for channel coding on given size q of a finite 

field and size I of a matrix. For q = 2, Korada et al. [5 | show that E max (2,£) < 0.55 for I < 31, and also show a method 
of construction of binary matrices with large exponents using the BCH codes. For q > 2 and I < q, the I x £ lower-right 
submatrix of the q-ary Reed-Solomon matrix gives the largest exponent so that E max (q,£) — log(^!)/(^log^) for £ < q 
since the Reed-Solomon code is an MDS code lEOl . Thus, the Reed-Solomon matrices with q > 2 can be regarded as 
providing a systematic means to construct polar codes with larger exponents for the case £ < q. For example, for q = 4, 
£ max (4,4) = £ , C (G RS (4)) w 0.573 12, which is larger than the upper bound 0.55 of £ max (2,^) for I < 31. For £ > q > 2, 
on the other hand, algebraic geometry codes are considered to be useful since they have a large minimum distance and the 
nested structure which are plausible in making Dc s larger. The examples using the Hermitian codes are shown in (2), in 
which q = p m and I = p 3m / 2 for an even integer m. The q-ary £ x £ matrix constructed on the basis of the Hermitian code 
has a yet larger exponent than the Reed-Solomon matrix Grs((/) for q > 4. 



3 Here, 



, i n are treated as integers in the additions. 
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0.3 0.325 0.35 0.375 0.4 0.425 0.45 0.475 0.5 
Rate 



Fig. 2. Numerical results on the upper bound of the block error probability of polar codes over an AWGN channel, for which the standard deviation of 
noise is set equal to 0.978 65. The capacity of the AWGN channel is about 0.5. Results of binary polar codes and quaternary polar codes using Gr,s(4) are 
shown by dotted curves and solid curves, respectively. Blocklengths are 2 7 , 2 9 , 2 11 , and 2 13 viewed as binary codes. 



IX. Numerical results 

In Fig. |2 performances of the original binary polar codes with G\ and quaternary polar codes using the Reed-Solomon 
matrix Grs(4) are compared on the binary-input AWGN channel with capacity about 0.5. Instead of the actual error probability, 
the upper bound ^2 ie ^Pc{X^"'^ lri ^ | '('")) is plotted where A denotes the set of chosen row indices in constructing 

polar codes. This bound is accurate for rates not close to the capacity ||2T1 . A significant improvement by the quaternary polar 
codes over the binary counterparts is observed in terms of the block error probability, although the error probability of the 
quaternary polar codes is still larger than that of (3,6)-regular LDPC codes except in a low-rate region. 

X. Summary 

We have shown that a necessary and sufficient condition for a g-ary £ x £ invertible matrix G over ¥ q with a non-identity 
standard form G to polarize any source/channel is F p (G) = ¥ q . The result about speed of polarization for the binary alphabet 
has been generalized to non-binary cases. We have also explicitly given q-ary £ x £ matrices with I < q on the basis of the g-ary 
Reed-Solomon matrices, which have the largest exponent E max (q, £) — log(£!)/(£ log £) among all £ x £ matrices. Performances 
of non-binary polar codes based on Reed-Solomon matrices are found via numerical evaluation to be significantly better than 
the performance of the original binary polar codes. 

Appendix A 
Proof of Lemma [TBI 

In order to relate the entropy and the Bhattacharyya parameter, the following lemma is useful. 
Lemma 30 ( 122IH . For any random variables X, Y and Z on sets X, y and Z, respectively, 

e ^ ( ^ )iog wl -- log ^ (e^v^^I 

xex.yey v ' yy > yey \xex J 

E Px ^ z{x ' y ' z)l0g pZ^x\l)pll(v \ z) ~ ~ lQS E Pz{z)[Y J P x\z{x\z)^PY\x,z{y\^z)) ■ 

xex, y ey,zez x * zy 1 ' Y \ z ^ y i > y ey,zez \xex / 

Proof: The second inequality is an immediate consequence of the first inequality and Jensen's inequality. The first inequality 
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is obtained by using Jensen's inequality twice as follows. 

Px,v{x,y) 



E p x,Y{x,y)\og 



Px(x)P Y (y) 



= -2 E Px,Y(x,y)lo, 



xex, y ey 



r P x (x)P Y (y) 
'V P x ,y(x,y) 



> -2J2 PY(y)logJ2 Px\y(x I 



yey 



l P X {x)P Y {y) 

Px,Y(x,y) 



we^ Vie* / 



■ 

In Lemma [30] the quantities on the left-hand sides are the mutual information between X and Y, and the conditional 
mutual information between X and Y given Z, respectively. The quantities on the right-hand sides are the cutoff rate and the 
conditional cutoff rate, respectively. 

Given a source (X,Y), let (Uo,U\, Xq, Xi,Yq,Yi) be the random variables defined by applying the basic transform with 
G 7 to the source (X,Y), as described in Section [III] Then, one obtains 



| Yx)-H{U x | U ,Y ,Yi) 

= E Pu ,Ui,Y ,Y 1 ( u O' U l,yo,yi)\og 

uoeF<,,uieF,,iyoey,i/iey 



> - log 



loe 



E^) E 

yiey u e¥ q , yo ey 

E Mvi) E 

yiey u ev q ,y ey 



PuoVuYo^iuoiUuyo I Vi) 

Pu^yMi I yi) p u a ,Y \Yx ("0,2/0 1 yi) 

2-i 



E p ^il^i( M i I yi)\/ p ;7o^o|c/i^i( M o,2/o I «i,yi) 

meF, 

-i 2- 



E p tfi|n( u i I yi)J Px ,y (uq +ui,y ) 



«ieF„ 



log 
log 



E P y(Vi) E p -x|i-(7 u i I yifPx^ilUi | yi ) \J Px,Y( u o + u i,yo)y Px,Y{uo + u[,ya) 

Lyiey uieF^u'jeF, « eF g ,y ey 

1 - E E p x\y(iui I yi)Px\Yhu[ | yi) 



1- E yPx,Y(uo + Ui,yo)^Px,Y(uo + u[,y ) 

M 6F,,y ey / 



> -lo. 



!-«E E - Py ^ 1 ) E - p -X|y(7 u i I 2/i)-Px|f(7Mi +7 rf I 2/i) 
rfeF, yjey «ieF, q 

( 1 - E \]Px,y{uq +ui,yo)\J Px,y{uq + u\ +d,y ) J 

V «oeF,,y ey /_ 

1-9 E ( E - / V(2/i)\/ p x|y(7"i I y^Px^ilux +jd \ y x ) J 

(l- E \J p x,Y{ua,yo)\J Px,y{uq + d,y ) J 

V «oeF,,y ey /. 

X - J E Z ld {X\Yf{l-Z d {X\Y)) 
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The first and second inequalities are obtained by Lemma [30l and Jensen's inequality, respectively. 

The assumption of Lemma [T5l implies that the above formula evaluated for (X, Y) = {Xt n -\, Y(n)) approaches as n — > oo, 
or equivalently, that for any e > 0, there exists no suc h that 

Zld{X{n) I ^(n))(l _ Zd{X{n) I Y(n))) < 6 

for any n > n and any deFJ. Fix e e (0, 1/2). Then, there exists tiq such that 

^ydP^n) I Y(n))0- - Zd(X( n ) | V( n ))) < e 2 

for any n > uq and any d G F*, which in turn implies 

Z 7d (X (n) | y (n) ) < e or 1 - Z d (X {n) \ Y (n) ) < e 
for any n > no and any deFJ. Assume 1 — Zd'{X^ n i^ | F( n ')) < e for fixed n' > no and fixed d' G FJ. Then, from 

Zd'(X(n') I Y (n>)){l - Z 1 -i d ,(X( n ,) | F( n '))) < e 2 

one obtains l — Z 7 -i c ;/(X( n /) | Y( n '))) < e 2 /(l— e) < e. By iterating this procedure, one proves that l — Z^i^ (Xt n n \ Yjy))) < e 
holds for all i G {0, . . . , q — 2}. In the same way, when Zd>(X( n /-) | F( n ')) < e is assumed for fixed n' > no and fixed d! eF ? x , 
one can prove that Z n i^i{X( n i\ \ Yr n i\)) < e holds for all i € {0, . . . , g — 2}. This completes the proof of Lemma [T5l 

Appendix B 

Bhattacharyya parameter and error probability 

In this appendix, an unconditional version of Lemma [T7] is proved. Lemma [T7] itself is then proved straightforwardly by 
Jensen's inequality. For the proof of the unconditional version, one can regard X as any finite set whose size q is not necessarily 
a power of a prime. Let X be a random variable on X. The optimum estimator for X minimizing the probability of error is 
given by x := argmaxj Px(x), with the error probability 



P e (X) := 1 - maxPx(^). 



The Bhattacharyya parameter is defined as 



Z(X) := J2 \/Px(x)P x {x>). 

x' '^x 

The following lemma gives an upper bound of the error probability in terms of the Bhattacharyya parameter. 
Lemma 31. 



P C (X) < min 



(q - 1)Z{X) + k(k - 1) 



fc=l,2,...,g-l [ k(k + 1) 

Proof: Noting that Px(x) = 1 — P C (X) holds by the definition, one has 

x x=£x 

In order to prove the lemma, we first find the extremal distribution of X for which Z(X) is minimized with P C (X) fixed. As 
we will show, this amounts to minimizing the second term on the right-hand side with respect to Px(x) under the constraint 
that the error probability is P e (X). We thus consider the following minimization problem for {pi}i=o,i,....g-2- 

minimize: y/pl 

i 

subject to: '^^p i = P e (X) 

i 

0<Pi < l-Pe(X). 

Let {p*}i=Q.i,...,q-2 be the optimum solution of the minimization problem. Since y/x is a concave function, p* is or 1 — P C (X) 
except for at most one i 1231 . Let t — 1 be the number of p*s which are equal to 1 — P C (X), then t = [1/(1 — P C (X))\ holds. 
The value of p* which is not or 1 — P C {X) is equal to 1 — t(l — P C (X)). Hence, 

VPx(x) > ty/1 - P C {X) + y/l- t(l - PJX)). (18) 

x 

By squaring both sides of (1181 1. one obtains the inequality 

1 + (q - l)Z(X) > 1 + t(t - 1)(1 - P e (X)) + 2tV(l - PJXW ~ *(1 " fe(I)) 



16 



which implies the minimum achievable value of the Bhattacharyya parameter for a given error probability. The right-hand side of 
the above inequality is further lower bounded by applying the inequality l—P e (X) > 1— t(l — P e (X)) t > 1/(1— P e (X)) — 1 
to the last term, yielding 

(q - l)Z{X) > t{t - 1)(1 - P C {X)) + 2i(l - t(l - P e PO)) 

= -(l-P c (X))< 2 + (l + P c (X))i. (19) 

Since the quadratic function —(1 — P e (X))x 2 + (1 + P c (X))x is concave and takes a maximum at x = (1 + P e (X))/(2(l — 
P e (X)), which is the center of the unit interval [P C (X)/(1 - P C (X)), 1/(1 - P e (X))] containing t, the inequality (Q]j]l still 
holds even if t is replaced by any integer fc = 1,2,. ..,<? — 1. ■ 
By replacing £ by 1/(1 — P C (X)) in (fT9l l, one obtains a looser but smooth bound 

PW < ( {q \ l)ZiX) . (20) 

This bound is also obtained from the monotonicity of the Renyi entropy i.e., H\/2(X) > Poo(X). These upper bounds are 
plotted in Fig. [3] for q = 5. 

The next lemma provides a lower bound of the error probability in terms of the Bhattacharyya parameter. 



Lemma 32. 



Pc(X) > q -^- (Vl + (q - l)Z(X) -y/1- Zjxjy. (21) 



Proof: We start with the same formula as that used as the starting point of the proof of Lemma [31] 

^v / ^W = v / i I ^) + Ev / ^) 



v / i 7 w + (?-i)E^ T V^R 



x^x ^ 



1 



< y/1 - P e (X) + {q- l^—P^X) 

= y/l-P c (X) + y/(q-l)P (X). (22) 

The above inequality is obtained from Jensen's inequality. This proof is the same as the proof of Fano's inequality for the 
Renyi entropy [24- 1 . By squaring both sides of the above inequality, one has 

2 



1 + (q - l)Z(X) < [Vl - P e (X) + ^q~l^P (X) 
Z(X) < (( 9 - 2)P e (X) + 2^q~l^P c {X){l~P c (X))) /{q - 1). 



The function 



f( x ) ■= (g ~ 2 ) x + ^Vq^V^ 1 - x) 



defined for x G [0, (q — l)/q) is continuous and strictly increasing since 

q - 2 1 - 2x 



fix) = 



q-l ^^1^(1 - x ) 



f/,{X) 2^q—l{x(\-x)f/^ <{) 



and f((q - l)/q) = 0. Hence, f^ 1 (Z(X)) < P C (X) where the inverse function / _1 (x) of f(x) is 

q 2 



r\x) = (yi + ( q - i) X - vr^y 



Lemma [TTI is obtained from Lemmas [3X1 and [321 by applying Jensen's inequality. The lower and upper bounds are plotted in 
Fig. [3] for q = 5. The bounds given in Lemma [T7l are the tightest among those which are given in terms of the Bhattacharyya 
parameter only. Tight examples are shown below. The lower bound in Lemma [17] is tight for the g-ary symmetric channel, 
defined by X = y = {0, . . . , q - 1} and 



Py\x{y I x) 



1 — e, if y = x 
e/0?-l), ify^a; 
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for e G [0, (q — l)/q]. In this case, 

P C (X \Y) = e 



Z{X \ Y) = l^A e + 2 J< l -t) 



which satisfies the lower bound with equality. The upper bound in Lemma [T7] is tight for the following channel. Let X = 

{0, . . . , q - 1}. For fixed fc G {1, . . . , q - 1}, let y = {a A } AeAk U {b B } BeAk+l where A k := {A C * | |„4| = fc}. 

{(1 - e)/(feli)' if 2/ = a -4 for Ae A k and x e A 
e/^ 1 ), if y = b B for B G and x e B 

0, otherwise 

for e G [0, 1]. This satisfies the upper bound with equality since one has 

' " WIT 



Appendix C 
Proof of Lemma[T81 

Similarly to Appendix |B] it is sufficient to prove an unconditional version of the inequalities in Lemma [18] Let X be a 
finite set of size q, and let X be a random variable on X. Let 



T(X) := 



xex 



Px(x)-- 

q 



be the total variation distance between Px and the uniform distribution over X. 

Let t := [1/(1 — P e (X))\. The same argument as that of minimizing the concave function in Appendix | 
minimizing —T(X) given P C (X), yielding the upper bound 



applies to 



T(X) < t[l - 

= q ~ 1 
q 



Pe{ x) - i 



i - t(i 



y 



+ t[l-P e (X)-~ 



q-1 



(q- 

t(l - P C (X)) 



MP e (x)). 

Let fc be a positive integer smaller 



We now derive the concave hull of fr(x) for obtaining the upper bound of T(X \ Y) 
than q. When x satisfies (fc — l)/fc < x < k/(k + 1), one has fc < 1/(1 — x) < k + 1, so that the value of t = [1/(1 — x)\ is 
equal to the constant fc. The function fr(x) is hence a convex function of x in the interval (fc — l)/fc < x < k/(k + 1), and 
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the linear interpolation of the values of fr(x) at the two endpoints x — (k — l)/k and x t k/(k+ 1) thus gives the concave 
hull of fr{x) for (k — l)/k < x < k/(k + 1). One therefore obtains the inequality 



f T (x) <(k + l)k 



(k/(k + l)-x)f T ((k-l)/k) + (x-(k-l)/k) lim f T (x) 

x^ I ( & "j - 1 ) 



for x satisfying (k — l)/k < x < k/(k + 1). By substituting 



f T ((k-l)/k) = -(q-k), 



lim fxfa;) = —(q — k — 1) 



one obtains 



/t(z) < -(g- k - 1) + -fc [(k - (k + l)x)] 



2(9-1) 2. 



'7 



-(l-a;)fc 2 + (l + a;)fc] 



and therefore 



T (X) < 2(g ^ - ?[-(! - P c (X))fc 2 + (1 + P c (X))fc] 



(23) 



for P e (X) satisfying (k — l)/k< P C (X) < k/(k + 1). As shown in the proof of Lemma [3~T1 the inequality ( 1231 is correct for 
any P C (X) £ [0, (g — l)/g]. Note that by replacing k by 1/(1 — P C (X)), one obtains a looser but smooth upper bound 



/x(PePO) < 

ier inequality 

T(X)=(l-P e (X)-i)+^ 



9-1 



- Pe(X) 



1-P C {X) V 9 

The unconditional version of the other inequality in Lemma [18] is obtained by applying the triangle inequality, as 

1 



> (l-P e (X)-- 



X^X 



Px(x) 



q 



£ (W) - 7, 



x^x 



(l-Pe(X)-- 



1\ , 



Pe(X) 



= 2 



g-1 
9 



-AW 



Appendix D 
Proof of Lemma|2T1 

As before, it is again sufficient to prove an unconditional version of the inequalities in Lemma |2TI The unconditional version 
S(X) of S(X | Y) is defined as 



S{X) \P* x {w)\ 

a — 1 ' a — 1 ^-^ 



For the upper bound, one obtains 



(q-l)S(X)= £ iPiMl^v^T |P1H| 2 = Vq(q~T) 



\ 



Px(z) 



Here, the inequality is obtained from the Cauchy-Schwarz inequality ||pg 1 ||i < ^/q\\pQ 1 ||2 which holds for p^ 1 € C 9 . 
The last equality holds via Perseval's identity, i.e., since the Fourier transform is unitary up to the constant yfq. Let t :— 

Ll/(l-P e (X))j. 



\ 



E 



Px{x)-~ 

q 



< \ t 



l - P C (X) - - 
q 



l-t(l-P e (X))-~ 

q 



+ (g-t-l)_ 
9 



= J(l - P e (X))t ((1 - P c (X))t - P e (X)) - <(1 - P e pQ) + 



- 1 



(24) 



Since (124b is piecewise convex with respect to P C (X), its concave hull is 
t(t + l) 



(t/(t + 1) - Pc (x))j q —- t-i + (p c (x) - (t iyt)< > q " 1 



q t+1 
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for P e {X) € [0, (q — l)/q). Since this is piecewise linear and convex, t can be replaced by any fc = 1, . . . , q — 1. Note that 
the following smooth upper bound is obtained by replacing the first (1 — P c (X))t in (|24| | by 1. 



S(X)<^l--^P e (X). 
The unconditional version of the lower bound in Lemma [2JJ is obtained via the triangle inequality, as 

E Px(z)x(wz) 



(q-l)S(X) + l = E \PxM\= E 

w£F q wew q 



zew„ 



max y 



wew q 



E p x( z )x(w(z - a)) 



zeF„ 



> max 

aGF„ 



E E xM«-o)) 



zGF„ 



tuGF, 



= gmaxP x (a) = g(l-P e (X)). 

aGF, 

Appendix E 
Proof of Lemma|281 

As in the argument for the binary case in lfl3l Chapter 5], MacWilliams identity is useful for the proof. Let H := G _1 and 
Hu\ := [ho, . . . , hi] where hi is the i-th column of H. Furthermore, we let the Fourier transform of the joint probability Px.y 
be defined as P XY (w,y) := Py(u)Px\y( w I v)- The generalized MacWilliams identity is obtained as follows. 

£-1 



3=0 



e n[jE*(«i(£> 



rl-^F^^O lQ w 3 e¥ q ^ fc=0 
£-1 



TTT E II E ^ ( x i (E ^" fcWfc " 
z t - 1 &v q ,wier+ 1 3=o l x J e¥ q v fc=o 



«-l rj 

I— I i 

n p x,y( z j^j) n x(^ w J w i) 



J=0 



J=0 



e - ^r 1 } n ^(^w) n tf-^o 



Hence the Fourier transform PZ-m v u) of the joint probability fjf co i s gi ven by 



^ £-1 i-l 

^wjoK.M'^r 1 )) = - e i { w o~ iH ti-i) + w i h i = z t i }n p x,Y( z 3,yj)nx(-w ] u J ). 



J=0 



J=0 



Then, one can derive the first inequality in Lemma [28] as 

5max(xW,yW)= max E l^(.),yw K"\ 2/q" 1 ))! 



max 

weF* 



E 



1 £-1 i-l 

- e ^^o" 1 ^-!) + = 4- 1 } n p x,r tew) n 



2 

z^eF'^GF; 



3=0 



3=0 



< max E I{^o _1 ^-- 



e-i 



9 ^- 1 eF«,™«- l eiri 



i^ = «o" 1 }IIEI f 5.^'»)l 



<g J 5 max (x,y) D = !) ( G ). (25) 

The last inequality in the above is obtained by observing that z^ 1 satisfying lUg -1 H%_i\ + Wih\ = 2g 1 should contain at 
least i#(G) nonzero elements, and that ^2 ve y \P X y(®> V)\ = 1 holds. 
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As for the second inequality in Lemma [28] one has 

S min (X«y«)= min Yl l^v^K -1 .^ -1 ))! 



mm 



E 



mm max 



E i E iK-^^D+^^^^r 1 } 



j=0 



9 "vt'ey^ul-'en ^e^w^eF 



j'=o J=0 

> min max ^ ^ I{«>o~ lfl |(-i> + = 4^} 

■ n p^Awi) ri (J e - ^») 

£-1 

= min max TT V P£ y ( (a^ 1 M_ 1} + Wi h\ )j , y) 

> min max ff 5 min (X, y) I {K- 1 ^- 1 >+- I ^)^o} = ^ min(X , r)^'( G ) 



(26) 



where the last equality in the above is obtained by noting that the maximization with respect to a l amounts to making the 
number of nonzero elements in a ~ Hh^ + Wih\ to be as small as possible. 



References 

[1] R. Mori and T. Tanaka, "Channel polarization on g-ary discrete memoryless channels by arbitrary kernels," in Proc. 2010 IEEE Int. Symp. Inf. Theory, 

Austin, TX„ Jul. 13-18, 2010, pp. 894-898. 
[2] , "Non-binary polar codes using Reed-Solomon codes and algebraic geometry codes," in Proc. 2010 IEEE Information Theory Workshop, Dublin 

Ireland, Aug. 30-Sep. 3, 2010. 

[3] E. Ankan, "Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels," IEEE Trans. 

Inf. Theory, vol. 55, no. 7, pp. 3051-3073, Jul. 2009. 
[4] E. §a§oglu, E. Telatar, and E. Ankan, "Polarization for arbitrary discrete memoryless channels," 2009. [Online]. Available: http://arxiv.org/abs/0908.0302vl 
[5] S. Korada, E. Sasoglu, and R. Urbanke, "Polar codes: Characterization of exponent, bounds, and constructions," IEEE Trans. Inf. Theory, vol. 56, no. 12, 

pp. 6253-6264, 2010. 

[6] E. Ankan, "Source polarization," in Proc. 2010 IEEE Int. Symp. Inf. Theory, Austin, TX., Jul. 13-18, 2010, pp. 899-903. 

[7] H. Cronie and S. Korada, "Lossless source coding with polar codes," in Proc. 2010 IEEE Int. Symp. Inf. Theory, Austin, TX., Jul. 13-18, 2010, pp. 
904-908. 

[8] N. Hussami, R. Urbanke, and S. Korada, "Performance of polar codes for channel and source coding," in Proc. IEEE Int. Symp. Inf. Theory, Seoul, 

South Korea, Jun. 28-Jul. 3 2009, pp. 1488-1492. 
[9] E. Abbe and E. Telatar, "Polar codes for the m-user MAC," 2010. [Online]. Available: http://arxiv.org/abs/1002.0777 
[10] S. Korada and R. Urbanke, "Polar codes are optimal for lossy source coding," Information Theory, IEEE Transactions on, vol. 56, no. 4, pp. 1751-1768, 
2010. 

[11] M. Karzand and E. Telatar, "Polar codes for q-ary source coding," in Proc. 2010 IEEE Int. Symp. Inf. Theory, Austin, TX. IEEE, Jun. 13-18, 2010, 
pp. 909-912. 

[12] E. Ankan and E. Telatar, "On the rate of channel polarization," in Proc. IEEE Int. Symp. Inf. Theory, Seoul, South Korea, Jun. 28-Jul. 3 2009, pp. 
1493-1495. 

[13] S. Korada, "Polar codes for channel and source coding," Ph.D. dissertation, Ecole Polytechnique Federale de Lausanne, 2009. [Online]. Available: 
http ://library . epfl . ch/theses/?nr=446 1 

[14] R. Mori, "Properties and construction of polar codes," Master's thesis, Kyoto University, 2010. [Online]. Available: http://arxiv.org/abs/1002.3521 

[15] S. Hassani, R. Mori, T. Tanaka, and R. Urbanke, "Rate-dependent analysis of the asymptotic behavior of channel polarization," IEEE Trans. Inf. 
Theory, 2012, to be published. [Online]. Available: http://arxiv.org/abs/1110.0194 

[16] T. Tanaka and R. Mori, "Refined rate of channel polarization," in Proc. 2010 IEEE Int. Symp. Inf. Theory, Austin, TX., Jun. 13-18, 2010, pp. 889-893. 

[17] S. Hassani and R. Urbanke, "On the scaling of polar codes: I. the behavior of polarized channels," in Proc. 2010 IEEE Int. Symp. Inf. Theory, Austin, 
TX., Jun. 13-18, 2010, pp. 874-888. 

[18] J. Massey, D. Costello, and J. Justesen, "Polynomial weights and code constructions," IEEE Trans. Inf. Theory, vol. 19, no. 1, pp. 101-110, Jan. 1973. 
[19] K. Saints and C. Heegard, "On hyperbolic cascaded Reed-Solomon codes," Applied Algebra, Algebraic Algorithms and Error-Correcting Codes, vol. 

673, pp. 291-303, May 1993. 
[20] F. MacWilliams and N. Sloane, The Theory of Error-Correcting Codes. North-Holland Amsterdam, 1977. 

[21] R. Mori and T. Tanaka, "Performance and construction of polar codes on symmetric binary-input memoryless channels," in Proc. 2009 IEEE Int. Symp. 

Inf. Theory, Seoul, South Korea, Jun. 28-Jul. 3, 2009, pp. 1496-1500. 
[22] R. Gallager, Information Theory and Reliable Communication. John Wiley & Sons, Inc. New York, NY, USA, 1968. 



21 



[23] M. Feder and N. Merhav, "Relations between entropy and error probability," IEEE Trans. Inf. Theory, vol. 40, no. 1, pp. 259-266, Jan. 1994. 
[24] M. Ben-Bassat and J. Raviv, "Renyi's entropy and the probability of error," IEEE Trans. Inf. Theory, vol. 24, no. 3, pp. 324-331, May 1978. 



